{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# **Hypothetical Document Embeddings (HyDE) RAG**\n",
        "\n",
        "HyDE operates by creating hypothetical document embeddings that represent ideal documents relevant to a given query. This method contrasts with conventional RAG systems, which typically rely on the similarity between a user's query and existing document embeddings. By generating these hypothetical embeddings, HyDE effectively guides the retrieval process towards documents that are more likely to contain pertinent information.\n",
        "\n",
        "Research Paper: [HyDE](https://arxiv.org/pdf/2212.10496)"
      ],
      "metadata": {
        "id": "-Lc5dcnnrPeB"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Initial Setup**"
      ],
      "metadata": {
        "id": "Lq2dw19XrZ-e"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "SARNqNWDq6Bk"
      },
      "outputs": [],
      "source": [
        "! pip install --q -U athina langchain-weaviate"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from google.colab import userdata\n",
        "os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
        "os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')\n",
        "os.environ['WEAVIATE_API_KEY'] = userdata.get('WEAVIATE_API_KEY')\n"
      ],
      "metadata": {
        "id": "xqGvkL6TrZcZ"
      },
      "execution_count": 3,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Indexing**"
      ],
      "metadata": {
        "id": "E0jWhkpOrz9s"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# load embedding model\n",
        "from langchain_openai import OpenAIEmbeddings\n",
        "embeddings = OpenAIEmbeddings()"
      ],
      "metadata": {
        "id": "qLfaQO3Krzts"
      },
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# load data\n",
        "from langchain.document_loaders import CSVLoader\n",
        "loader = CSVLoader(\"./context.csv\")\n",
        "documents = loader.load()"
      ],
      "metadata": {
        "id": "WEa2xJmBr339"
      },
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# split documents\n",
        "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
        "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
        "documents = text_splitter.split_documents(documents)"
      ],
      "metadata": {
        "id": "6VPx8W_kr7T_"
      },
      "execution_count": 6,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Weaviate Vector Database**"
      ],
      "metadata": {
        "id": "akqJjsXGYQZO"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create vectorstore using weaviate\n",
        "import weaviate\n",
        "from weaviate.classes.init import Auth\n",
        "import os\n",
        "\n",
        "wcd_url = 'your_url'\n",
        "\n",
        "client = weaviate.connect_to_weaviate_cloud(\n",
        "    cluster_url=wcd_url,\n",
        "    auth_credentials=Auth.api_key(os.environ['WEAVIATE_API_KEY']),\n",
        "    headers={'X-OpenAI-Api-key': os.environ[\"OPENAI_API_KEY\"]}\n",
        ")"
      ],
      "metadata": {
        "id": "I7VqD_cdhASC"
      },
      "execution_count": 7,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create vectorstore\n",
        "from langchain_weaviate.vectorstores import WeaviateVectorStore\n",
        "vectorstore = WeaviateVectorStore.from_documents(documents,embedding=OpenAIEmbeddings(),client=client, index_name=\"your_collection_name\",text_key=\"text\")"
      ],
      "metadata": {
        "id": "r0_LpjKchFEJ"
      },
      "execution_count": 8,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# checking similarity search\n",
        "vectorstore.similarity_search(\"world war II\", k=3)"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fmgD1vmarOEe",
        "outputId": "7615a067-4da3-4797-e77b-822e06dcc959"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[Document(metadata={'source': './context.csv', 'row': 36.0}, page_content='context: [\"The European theatre of World War II was one of the two main theatres of combat during World War II. It saw heavy fighting across Europe for almost six years, starting with Germany\\'s invasion of Poland on 1 September 1939 and ending with the Western Allies conquering most of Western Europe, the Soviet Union conquering most of Eastern Europe and Germany\\'s unconditional surrender on 8 May 1945 although fighting continued elsewhere in Europe until 25 May. On 5 June 1945, the Berlin'),\n",
              " Document(metadata={'source': './context.csv', 'row': 59.0}, page_content='war to end all wars\" due to their perception of its then-unparalleled scale, devastation, and loss of life. After World War II\\''),\n",
              " Document(metadata={'source': './context.csv', 'row': 59.0}, page_content='similarly wrote, \"Some wars name themselves. This is the Great War.\" Contemporary Europeans also referred to it as \"the war to end war\" and it was also described as \"the war to end all wars\" due to their perception of its then-unparalleled scale, devastation, and loss of life. After World War II began in 1939, the terms (often abbreviated as WWI or WW1) became more standard, with British Empire historians, including Canadians, favouring \"The First World War\" and Americans \"World War I\".\\\\n\\\\n\\\\n==')]"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# # call vectorstore fron weaviate cloud\n",
        "# vectorstore = WeaviateVectorStore(client=client,index_name=\"your_collection_name\",embedding=OpenAIEmbeddings(),text_key=\"text\")"
      ],
      "metadata": {
        "id": "BhsHwu-jhJJ1"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Chromadb (Optional)**"
      ],
      "metadata": {
        "id": "ZeUFP3fNYqIm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# # optional vectorstore\n",
        "# !pip install chromadb\n",
        "# # create vectorstore\n",
        "# from langchain.vectorstores import Chroma\n",
        "# vectorstore = Chroma.from_documents(documents, embeddings)"
      ],
      "metadata": {
        "id": "2OAXj8IKr-e1"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Retriever**"
      ],
      "metadata": {
        "id": "nLpKXT9lsFCm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create retriever\n",
        "retriever = vectorstore.as_retriever()"
      ],
      "metadata": {
        "id": "YyHutNF4sCgG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Hypothetical Answer Chain**"
      ],
      "metadata": {
        "id": "pu8nGFrfsVF5"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create llm\n",
        "from langchain_openai import ChatOpenAI\n",
        "llm = ChatOpenAI()"
      ],
      "metadata": {
        "id": "bplMyS25sg-T"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# chain without the retriever\n",
        "from langchain_core.output_parsers import StrOutputParser\n",
        "from langchain_core.prompts import ChatPromptTemplate\n",
        "from langchain_openai import ChatOpenAI\n",
        "\n",
        "template =\"\"\"\n",
        "You are a helpful assistant that answers questions.\n",
        "Question: {input}\n",
        "Answer:\n",
        "\"\"\"\n",
        "prompt = ChatPromptTemplate.from_messages(\n",
        "    [\n",
        "        (\"system\", template),\n",
        "        (\"human\", \"{input}\"),\n",
        "    ]\n",
        ")\n",
        "qa_no_context = prompt | llm | StrOutputParser()"
      ],
      "metadata": {
        "id": "ddoXg_TSsQzy"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# response\n",
        "question = 'how does interlibrary loan work'\n",
        "answer = qa_no_context.invoke({\"input\": question})\n",
        "answer"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 139
        },
        "id": "Gouisxtssl55",
        "outputId": "8316e658-0885-4451-f9ae-f730a46f8233"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "\"Interlibrary loan is a service offered by libraries that allows patrons to borrow materials from other libraries. Here is how it typically works:\\n\\n1. **Request:** If a patron needs a book, article, or other material that their local library does not have, they can request it through interlibrary loan.\\n  \\n2. **Search:** The library staff will search for a library that owns the requested material and is willing to lend it.\\n  \\n3. **Request Submission:** Once a lending library is found, the request is submitted, and the material is shipped to the patron's local library.\\n  \\n4. **Pickup:** The patron can then pick up the material from their library, usually for a limited loan period.\\n  \\n5. **Return:** After using the material, the patron returns it to their library, which then returns it to the lending library.\\n\\nThere may be fees associated with interlibrary loan, depending on the policies of the lending library or the patron's local library. Interlibrary loan is a valuable service that allows patrons to access a wide range of materials beyond what their local library may have in its collection.\""
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 17
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Combined RAG Chain**"
      ],
      "metadata": {
        "id": "88kCzWM5tHyv"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# response with context\n",
        "retrieval_chain = qa_no_context | retriever\n",
        "retrieved_docs = retrieval_chain.invoke({\"input\":question})"
      ],
      "metadata": {
        "id": "MStndz9DtRTv"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "template = \"\"\"\n",
        "You are a helpful assistant that answers questions based on the provided context.\n",
        "Use the provided context to answer the question.\n",
        "Question: {input}\n",
        "Context: {context}\n",
        "\"\"\"\n",
        "\n",
        "prompt = ChatPromptTemplate.from_template(template)\n",
        "\n",
        "final_rag_chain = (\n",
        "    prompt\n",
        "    | llm\n",
        "    | StrOutputParser()\n",
        ")"
      ],
      "metadata": {
        "id": "9qsDOirqtBcU"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# final response\n",
        "final_rag_chain.invoke({\"context\":retrieved_docs,\"input\":question})"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 87
        },
        "id": "_XqIxnRRtiRf",
        "outputId": "61d912e1-4255-46d1-f851-3b0735d8776f"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'Interlibrary loan works by allowing patrons of one library to borrow physical materials or receive electronic documents that are held by another library. The borrowing library identifies potential lending libraries with the desired item, and the lending library delivers the item either physically or electronically. The borrowing library then receives the item, delivers it to their patron, and arranges for its return if necessary. In some cases, fees may accompany interlibrary loan services. Libraries negotiate for interlibrary loan eligibility, especially for digital materials like ebooks, through legal, technical, and licensing aspects.'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Preparing Data for Evaluation**"
      ],
      "metadata": {
        "id": "n3Whi8wAt9gG"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# create dataset\n",
        "question = [\"how does interlibrary loan work\"]\n",
        "response = []\n",
        "contexts = []\n",
        "\n",
        "# Inference\n",
        "for query in question:\n",
        "  response.append(final_rag_chain.invoke({\"context\":retrieved_docs,\"input\":query}))\n",
        "  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])\n",
        "\n",
        "# To dict\n",
        "data = {\n",
        "    \"query\": question,\n",
        "    \"response\": response,\n",
        "    \"context\": contexts,\n",
        "}"
      ],
      "metadata": {
        "id": "wPK9Tb5Dt_qa"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create dataset\n",
        "from datasets import Dataset\n",
        "dataset = Dataset.from_dict(data)"
      ],
      "metadata": {
        "id": "JwIkrj3QuRyX"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# create dataframe\n",
        "import pandas as pd\n",
        "df = pd.DataFrame(dataset)"
      ],
      "metadata": {
        "id": "Buq9tERBuTfw"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "df"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 89
        },
        "id": "B9rW-W2WuVJ0",
        "outputId": "10144a8b-0e99-47b6-b2f8-c57865fdc68b"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                             query  \\\n",
              "0  how does interlibrary loan work   \n",
              "\n",
              "                                            response  \\\n",
              "0  Interlibrary loan works by enabling patrons of...   \n",
              "\n",
              "                                             context  \n",
              "0  [Procedures and methods ==\\n\\nAfter receiving ...  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-6b16f352-6311-44d1-9889-b513a96412ce\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>query</th>\n",
              "      <th>response</th>\n",
              "      <th>context</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>how does interlibrary loan work</td>\n",
              "      <td>Interlibrary loan works by enabling patrons of...</td>\n",
              "      <td>[Procedures and methods ==\\n\\nAfter receiving ...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-6b16f352-6311-44d1-9889-b513a96412ce')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-6b16f352-6311-44d1-9889-b513a96412ce button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-6b16f352-6311-44d1-9889-b513a96412ce');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "  <div id=\"id_e33886f9-e4ab-4750-b0d9-0dab2bdeed86\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_e33886f9-e4ab-4750-b0d9-0dab2bdeed86 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df",
              "summary": "{\n  \"name\": \"df\",\n  \"rows\": 1,\n  \"fields\": [\n    {\n      \"column\": \"query\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"how does interlibrary loan work\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"response\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 1,\n        \"samples\": [\n          \"Interlibrary loan works by enabling patrons of one library to borrow physical materials or receive electronic documents from another library that holds the desired items. The borrowing library identifies potential lending libraries with the item, and the lending library delivers the item to the borrowing library either physically or electronically. If necessary, the borrowing library arranges for the return of the item. Fees may accompany interlibrary loan services, and libraries negotiate for eligibility to supply certain materials. With the increasing demand for digital materials, libraries are exploring the legal, technical, and licensing aspects of lending and borrowing ebooks through interlibrary loan.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"context\",\n      \"properties\": {\n        \"dtype\": \"object\",\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 24
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Convert to dictionary\n",
        "df_dict = df.to_dict(orient='records')\n",
        "\n",
        "# Convert context to list\n",
        "for record in df_dict:\n",
        "    if not isinstance(record.get('context'), list):\n",
        "        if record.get('context') is None:\n",
        "            record['context'] = []\n",
        "        else:\n",
        "            record['context'] = [record['context']]"
      ],
      "metadata": {
        "id": "L-Zigq3duX8u"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## **Evaluation in Athina AI**\n",
        "\n",
        "We will use **Context Relevancy** eval here. It Measures the relevancy of the retrieved context, calculated based on both the query and contexts. Please refer to our [documentation](https://docs.athina.ai/api-reference/evals/preset-evals/overview) for further details"
      ],
      "metadata": {
        "id": "T8jVatzLucDa"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# set api keys for Athina evals\n",
        "from athina.keys import AthinaApiKey, OpenAiApiKey\n",
        "OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))\n",
        "AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))"
      ],
      "metadata": {
        "id": "SdZTMcNpuchu"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# load dataset\n",
        "from athina.loaders import Loader\n",
        "dataset = Loader().load_dict(df_dict)"
      ],
      "metadata": {
        "id": "DRSvQehlugRJ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# evaluate\n",
        "from athina.evals import RagasContextRelevancy\n",
        "RagasContextRelevancy(model=\"gpt-4o\").run_batch(data=dataset).to_df()"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 462
        },
        "id": "ruU4Qq5Luh7S",
        "outputId": "bc6534f4-1b87-44f6-d5dd-5d1a066b25dd"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "evaluating with [context_relevancy]\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "100%|██████████| 1/1 [00:01<00:00,  1.82s/it]\n"
          ]
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "You can view your dataset at: https://app.athina.ai/develop/1db6cbc3-1096-4b1d-881e-db538858691d\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                             query  \\\n",
              "0  how does interlibrary loan work   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               context  \\\n",
              "0  [Procedures and methods ==\\n\\nAfter receiving a request from their patron, the borrowing library identifies potential lending libraries with the desired item. The lending library then delivers the item physically or electronically, and the borrowing library receives the item, delivers it to their patron, and if necessary, arranges for its return. In some cases, fees accompany interlibrary loan services. While the majority of interlibrary loan requests are now managed through semi-automated, ...   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              response  \\\n",
              "0  Interlibrary loan works by enabling patrons of one library to borrow physical materials or receive electronic documents from another library that holds the desired items. The borrowing library identifies potential lending libraries with the item, and the lending library delivers the item to the borrowing library either physically or electronically. If necessary, the borrowing library arranges for the return of the item. Fees may accompany interlibrary loan services, and libraries negotiate f...   \n",
              "\n",
              "  expected_response             display_name failed  \\\n",
              "0              None  Ragas Context Relevancy   None   \n",
              "\n",
              "                                                                                                                                                                        grade_reason  \\\n",
              "0  This metric is calulated by dividing the number of sentences in context that are relevant for answering the given query by the total number of sentences in the retrieved context   \n",
              "\n",
              "   runtime   model  ragas_context_relevancy  \n",
              "0     2477  gpt-4o                     0.04  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-5d0f8f5e-127a-4713-8db9-3dba86bfc0e3\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>query</th>\n",
              "      <th>context</th>\n",
              "      <th>response</th>\n",
              "      <th>expected_response</th>\n",
              "      <th>display_name</th>\n",
              "      <th>failed</th>\n",
              "      <th>grade_reason</th>\n",
              "      <th>runtime</th>\n",
              "      <th>model</th>\n",
              "      <th>ragas_context_relevancy</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>how does interlibrary loan work</td>\n",
              "      <td>[Procedures and methods ==\\n\\nAfter receiving a request from their patron, the borrowing library identifies potential lending libraries with the desired item. The lending library then delivers the item physically or electronically, and the borrowing library receives the item, delivers it to their patron, and if necessary, arranges for its return. In some cases, fees accompany interlibrary loan services. While the majority of interlibrary loan requests are now managed through semi-automated, ...</td>\n",
              "      <td>Interlibrary loan works by enabling patrons of one library to borrow physical materials or receive electronic documents from another library that holds the desired items. The borrowing library identifies potential lending libraries with the item, and the lending library delivers the item to the borrowing library either physically or electronically. If necessary, the borrowing library arranges for the return of the item. Fees may accompany interlibrary loan services, and libraries negotiate f...</td>\n",
              "      <td>None</td>\n",
              "      <td>Ragas Context Relevancy</td>\n",
              "      <td>None</td>\n",
              "      <td>This metric is calulated by dividing the number of sentences in context that are relevant for answering the given query by the total number of sentences in the retrieved context</td>\n",
              "      <td>2477</td>\n",
              "      <td>gpt-4o</td>\n",
              "      <td>0.04</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-5d0f8f5e-127a-4713-8db9-3dba86bfc0e3')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-5d0f8f5e-127a-4713-8db9-3dba86bfc0e3 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-5d0f8f5e-127a-4713-8db9-3dba86bfc0e3');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "repr_error": "Out of range float values are not JSON compliant: nan"
            }
          },
          "metadata": {},
          "execution_count": 28
        }
      ]
    }
  ]
}